Cobb County
A Nonparametric Adaptive EWMA Control Chart for Binary Monitoring of Multiple Stream Processes
Muritala, Faruk, Brown, Austin, Ghosh, Dhrubajyoti, Ni, Sherry
Monitoring binomial proportions across multiple independent streams is a critical challenge in Statistical Process Control (SPC), with applications from manufacturing to cybersecurity. While EWMA charts offer sensitivity to small shifts, existing implementations rely on asymptotic variance approximations that fail during early-phase monitoring. We introduce a Cumulative Standardized Binomial EWMA (CSB-EWMA) chart that overcomes this limitation by deriving the exact time-varying variance of the EWMA statistic for binary multiple-stream data, enabling adaptive control limits that ensure statistical rigor from the first sample. Through extensive simulations, we identify optimal smoothing (λ) and limit (L) parameters to achieve target in-control average run length (ARL0) of 370 and 500. The CSB-EWMA chart demonstrates rapid shift detection across both ARL0 targets, with out-of-control average run length (ARL1) dropping to 3-7 samples for moderate shifts (δ=0.2), and exhibits exceptional robustness across different data distributions, with low ARL1 Coefficients of Variation (CV < 0.10 for small shifts) for both ARL0 = 370 and 500. This work provides practitioners with a distribution-free, sensitive, and theoretically sound tool for early change detection in binomial multiple-stream processes.
- North America > United States > New Jersey > Hudson County > Hoboken (0.05)
- North America > United States > New York (0.04)
- North America > United States > Georgia > Cobb County > Kennesaw (0.04)
- Europe > United Kingdom > England (0.04)
- Information Technology > Security & Privacy (0.89)
- Government > Military > Cyberwarfare (0.35)
Privacy-Preserving Generative Modeling and Clinical Validation of Longitudinal Health Records for Chronic Disease
Ballyk, Benjamin D., Gupta, Ankit, Konda, Sujay, Subramanian, Kavitha, Landon, Chris, Naseer, Ahmed Ammar, Maierhofer, Georg, Swaminathan, Sumanth, Venkateshwaran, Vasudevan
Data privacy is a critical challenge in modern medical workflows as the adoption of electronic patient records has grown rapidly. Stringent data protection regulations limit access to clinical records for training and integrating machine learning models that have shown promise in improving diagnostic accuracy and personalized care outcomes. Synthetic data offers a promising alternative; however, current generative models either struggle with time-series data or lack formal privacy guaranties. In this paper, we enhance a state-of-the-art time-series generative model to better handle longitudinal clinical data while incorporating quantifiable privacy safeguards. Using real data from chronic kidney disease and ICU patients, we evaluate our method through statistical tests, a Train-on-Synthetic-Test-on-Real (TSTR) setup, and expert clinical review. Our non-private model (Augmented TimeGAN) outperforms transformer- and flow-based models on statistical metrics in several datasets, while our private model (DP-TimeGAN) maintains a mean authenticity of 0.778 on the CKD dataset, outperforming existing state-of-the-art models on the privacy-utility frontier. Both models achieve performance comparable to real data in clinician evaluations, providing robust input data necessary for developing models for complex chronic conditions without compromising data privacy.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- (11 more...)
- Research Report > Experimental Study (0.66)
- Research Report > New Finding (0.46)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.64)
Enhancing Breast Cancer Prediction with LLM-Inferred Confounders
Wheeler High School, Marietta, GA Abstract This study enhances breast cancer prediction by using large language models to infer the likelihood of confounding diseases, namely diabetes, obesity, and cardiovascular disease, from routine clinical data. These AI-generated features improved Random Forest model performance, particularly for LLMs like Gemma (3.9%) and Llama (6.4%). The approach shows promise for noninvasive prescreening and clinical integration, supporting improved early detection and shared decision-making in breast cancer diagnosis. Introduction Breast cancer (BC) is a leading cause of death among women in the U.S., with most cases having unknown causes despite known risk factors1. Researchers have identified correlations between BC and various clinical features and biomarkers, such as body mass index, glucose, insulin, leptin, adiponectin, resistin, MCP-1, and HOMA, that can be measured through routine blood tests.
Interpretable Machine Learning for Cognitive Aging: Handling Missing Data and Uncovering Social Determinant
Mao, Xi, Wang, Zhendong, Li, Jingyu, Mao, Lingchao, Essien, Utibe, Wang, Hairong, Ni, Xuelei Sherry
Early detection of Alzheimer's disease (AD) is crucial because its neurodegenerative effects are irreversible, and neuropathologic and social-behavioral risk factors accumulate years before diagnosis. Identifying higher-risk individuals earlier enables prevention, timely care, and equitable resource allocation. We predict cognitive performance from social determinants of health (SDOH) using the NIH NIA-supported PREPARE Challenge Phase 2 dataset derived from the nationally representative Mex-Cog cohort of the 2003 and 2012 Mexican Health and Aging Study (MHAS). Data: The target is a validated composite cognitive score across seven domains-orientation, memory, attention, language, constructional praxis, and executive function-derived from the 2016 and 2021 MHAS waves. Predictors span demographic, socioeconomic, health, lifestyle, psychosocial, and healthcare access factors. Methodology: Missingness was addressed with a singular value decomposition (SVD)-based imputation pipeline treating continuous and categorical variables separately. This approach leverages latent feature correlations to recover missing values while balancing reliability and scalability. After evaluating multiple methods, XGBoost was chosen for its superior predictive performance. Results and Discussion: The framework outperformed existing methods and the data challenge leaderboard, demonstrating high accuracy, robustness, and interpretability. SHAP-based post hoc analysis identified top contributing SDOH factors and age-specific feature patterns. Notably, flooring material emerged as a strong predictor, reflecting socioeconomic and environmental disparities. Other influential factors, age, SES, lifestyle, social interaction, sleep, stress, and BMI, underscore the multifactorial nature of cognitive aging and the value of interpretable, data-driven SDOH modeling.
- North America > United States > Texas > Travis County > Austin (0.14)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > Georgia > Cobb County > Marietta (0.04)
- (5 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
Efficient Hate Speech Detection: Evaluating 38 Models from Traditional Methods to Transformers
Abusaqer, Mahmoud, Saquer, Jamil, Shatnawi, Hazim
The proliferation of hate speech on social media necessitates automated detection systems that balance accuracy with computational efficiency. This study evaluates 38 model configurations in detecting hate speech across datasets ranging from 6.5K to 451K samples. We analyze transformer architectures (e.g., BERT, RoBERTa, Distil-BERT), deep neural networks (e.g., CNN, LSTM, GRU, Hierarchical Attention Networks), and traditional machine learning methods (e.g., SVM, CatBoost, Random Forest). Our results show that transformers, particularly RoBERTa, consistently achieve superior performance with accuracy and F1-scores exceeding 90%. Among deep learning approaches, Hierarchical Attention Networks yield the best results, while traditional methods like CatBoost and SVM remain competitive, achieving F1-scores above 88% with significantly lower computational costs. Additionally, our analysis highlights the importance of dataset characteristics, with balanced, moderately sized unprocessed datasets outperforming larger, preprocessed datasets. These findings offer valuable insights for developing efficient and effective hate speech detection systems.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
- (23 more...)
A Geometric Graph-Based Deep Learning Model for Drug-Target Affinity Prediction
Rana, Md Masud, Mukta, Farjana Tasnim, Nguyen, Duc D.
In structure-based drug design, accurately estimating the binding affinity between a candidate ligand and its protein receptor is a central challenge. Recent advances in artificial intelligence, particularly deep learning, have demonstrated superior performance over traditional empirical and physics-based methods for this task, enabled by the growing availability of structural and experimental affinity data. In this work, we introduce DeepGGL, a deep convolutional neural network that integrates residual connections and an attention mechanism within a geometric graph learning framework. By leveraging multiscale weighted colored bipartite subgraphs, DeepGGL effectively captures fine-grained atom-level interactions in protein-ligand complexes across multiple scales. We benchmarked DeepGGL against established models on CASF-2013 and CASF-2016, where it achieved state-of-the-art performance with significant improvements across diverse evaluation metrics. To further assess robustness and generalization, we tested the model on the CSAR-NRC-HiQ dataset and the PDBbind v2019 holdout set. DeepGGL consistently maintained high predictive accuracy, highlighting its adaptability and reliability for binding affinity prediction in structure-based drug discovery.
- North America > United States > Tennessee > Knox County > Knoxville (0.14)
- North America > United States > Georgia > Cobb County > Kennesaw (0.04)
A Survey: Towards Privacy and Security in Mobile Large Language Models
Xu, Honghui, Li, Kaiyang, Chen, Wei, Zheng, Danyang, Li, Zhiyuan, Cai, Zhipeng
--Mobile Large Language Models (LLMs) are revolutionizing diverse fields such as healthcare, finance, and education with their ability to perform advanced natural language processing tasks on-the-go. However, the deployment of these models in mobile and edge environments introduces significant challenges related to privacy and security due to their resource-intensive nature and the sensitivity of the data they process. This survey provides a comprehensive overview of privacy and security issues associated with mobile LLMs, systematically categorizing existing solutions such as differential privacy, federated learning, and prompt encryption. Furthermore, we analyze vulnerabilities unique to mobile LLMs, including adversarial attacks, membership inference, and side-channel attacks, offering an in-depth comparison of their effectiveness and limitations. T o bridge this gap, we propose potential applications, discuss open challenges, and suggest future research directions, paving the way for the development of trustworthy, privacy-compliant, and scalable mobile LLM systems. The advent of mobile Large Language Models (LLMs) represents a significant milestone in the evolution of AI, enabling advanced natural language processing capabilities to be deployed in mobile and edge environments [1]-[3]. By bringing powerful AI tools closer to end-users, mobile LLMs are revolutionizing industries such as healthcare [4], finance [5], and education [6] with real-time, on-device applications.
- North America > United States > Massachusetts (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > Georgia > Cobb County > Marietta (0.04)
- (3 more...)
- Overview (1.00)
- Research Report > Promising Solution (0.46)
Efficient and Scalable Estimation of Distributional Treatment Effects with Multi-Task Neural Networks
Hirata, Tomu, Byambadalai, Undral, Oka, Tatsushi, Yasui, Shota, Uto, Shingo
We propose a novel multi-task neural network approach for estimating distributional treatment effects (DTE) in randomized experiments. While DTE provides more granular insights into the experiment outcomes over conventional methods focusing on the Average Treatment Effect (ATE), estimating it with regression adjustment methods presents significant challenges. Specifically, precision in the distribution tails suffers due to data imbalance, and computational inefficiencies arise from the need to solve numerous regression problems, particularly in large-scale datasets commonly encountered in industry. To address these limitations, our method leverages multi-task neural networks to estimate conditional outcome distributions while incorporating monotonic shape constraints and multi-threshold label learning to enhance accuracy. To demonstrate the practical effectiveness of our proposed method, we apply our method to both simulated and real-world datasets, including a randomized field experiment aimed at reducing water consumption in the US and a large-scale A/B test from a leading streaming platform in Japan. The experimental results consistently demonstrate superior performance across various datasets, establishing our method as a robust and practical solution for modern causal inference applications requiring a detailed understanding of treatment effect heterogeneity.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > Georgia > Cobb County (0.04)
- Research Report > Strength High (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
An Uncertainty-Aware Dynamic Decision Framework for Progressive Multi-Omics Integration in Classification Tasks
Mu, Nan, Yang, Hongbo, Zhao, Chen
Background and Objective: High-throughput multi-omics technologies have proven invaluable for elucidating disease mechanisms and enabling early diagnosis. However, the high cost of multi-omics profiling imposes a significant economic burden, with over reliance on full omics data potentially leading to unnecessary resource consumption. To address these issues, we propose an uncertainty-aware, multi-view dynamic decision framework for omics data classification that aims to achieve high diagnostic accuracy while minimizing testing costs. Methodology: At the single-omics level, we refine the activation functions of neural networks to generate Dirichlet distribution parameters, utilizing subjective logic to quantify both the belief masses and uncertainty mass of classification results. Belief mass reflects the support of a specific omics modality for a disease class, while the uncertainty parameter captures limitations in data quality and model discriminability, providing a more trustworthy basis for decision-making. At the multi omics level, we employ a fusion strategy based on Dempster-Shafer theory to integrate heterogeneous modalities, leveraging their complementarity to boost diagnostic accuracy and robustness. A dynamic decision mechanism is then applied that omics data are incrementally introduced for each patient until either all data sources are utilized or the model confidence exceeds a predefined threshold, potentially before all data sources are utilized. Results and Conclusion: We evaluate our approach on four benchmark multi-omics datasets, ROSMAP, LGG, BRCA, and KIPAN. In three datasets, over 50% of cases achieved accurate classification using a single omics modality, effectively reducing redundant testing. Meanwhile, our method maintains diagnostic performance comparable to full-omics models and preserves essential biological insights.
- Asia > China > Sichuan Province > Chengdu (0.04)
- North America > United States > Georgia > Cobb County > Marietta (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Diagnostic Medicine (1.00)
- Health & Medicine > Therapeutic Area > Neurology (0.93)
Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech
Li, Jingyu, Mao, Lingchao, Wang, Hairong, Wang, Zhendong, Mao, Xi, Ni, Xuelei Sherry
Background: Alzheimer's disease and related dementias (ADRD) are progressive neurodegenerative conditions where early detection is vital for timely intervention and care. Spontaneous speech contains rich acoustic and linguistic markers that may serve as non-invasive biomarkers for cognitive decline. Foundation models, pre-trained on large-scale audio or text data, produce high-dimensional embeddings encoding contextual and acoustic features. Methods: We used the PREPARE Challenge dataset, which includes audio recordings from over 1,600 participants with three cognitive statuses: healthy control (HC), mild cognitive impairment (MCI), and Alzheimer's Disease (AD). We excluded non-English, non-spontaneous, or poor-quality recordings. The final dataset included 703 (59.13%) HC, 81 (6.81%) MCI, and 405 (34.06%) AD cases. We benchmarked a range of open-source foundation speech and language models to classify cognitive status into the three categories. Results: The Whisper-medium model achieved the highest performance among speech models (accuracy = 0.731, AUC = 0.802). Among language models, BERT with pause annotation performed best (accuracy = 0.662, AUC = 0.744). ADRD detection using state-of-the-art automatic speech recognition (ASR) model-generated audio embeddings outperformed others. Including non-semantic features like pause patterns consistently improved text-based classification. Conclusion: This study introduces a benchmarking framework using foundation models and a clinically relevant dataset. Acoustic-based approaches -- particularly ASR-derived embeddings -- demonstrate strong potential for scalable, non-invasive, and cost-effective early detection of ADRD.
- North America > United States > Texas > Hidalgo County > Edinburg (0.04)
- North America > United States > Texas > Cameron County > Brownsville (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)